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ABSTRACT 

We present a study of the clustering and halo occupation distribution of BOSS CMASS 
galaxies in the redshift range 0.43 < z < 0.7 drawn from the Final SDSS-III Data Re¬ 
lease. We compare the BOSS results with the predictions of a Halo Abundance Match¬ 
ing (HAM) clustering model that assigns galaxies to dark matter halos selected from 
the large BigMultiDark A-body simulation of a flat ACDM Planck cosmology. We 
compare the observational data with the simulated ones on a light-cone constructed 
from 20 subsequent outputs of the simulation. Observational effects such as incom¬ 
pleteness, geometry, veto masks and fiber collisions are included in the model, which 
reproduces within l-cr errors the observed monopole of the 2-point correlation function 
at all relevant scales: from the smallest scales, 0.5 h~^ Mpc, up to scales beyond the 
Baryonic Acoustic Oscillation feature. This model also agrees remarkably well with 
the BOSS galaxy power spectrum (up to fc ~ 1 h Mpc“^), and the Three-point correla¬ 
tion function. The quadrupole of the correlation function presents some tensions with 
observations. We discuss possible causes that can explain this disagreement, including 
target selection effects. Overall, the standard HAM model describes remarkably well 
the clustering statistics of the CMASS sample. We compare the stellar to halo mass 
relation for the CMASS sample measured using weak leasing in the CFHT Stripe 82 
Survey with the prediction of our clustering model, and find a good agreement within 
1-(T. The BigMD-BOSS light-cone including properties of BOSS galaxies and halo 
properties is made publicly available. 

Key words: Cosmology:large-scale structure of Universe - galaxies: abundances - 
galaxies: halos - methods: numerical 


1 INTRODUCTION 

One of the major goals in cosmology is to explain the for¬ 
mation of the large-scale structure of the Universe. How¬ 
ever, the main ingredient that drives this evolution - the 
dark matter - can only be probed using the distribution of 
galaxies, and galaxies are biased tracers of the matter field. 
This makes this study challenging. In the last twenty years, 
vast amounts of observational data have been obtained, im¬ 
proving each time the precision of the large-scale structure 
measurements and demanding ever more accurate theoreti¬ 
cal models. In fact, one of the strongest arguments that we 
understand how the large-scale structure forms and evolves 
is our ability to reproduce the galaxy clustering through 
cosmic time, starting from the primordial Gaussian pertur¬ 
bations. During the last decade, surveys such as the Sloan 
Digital Sky Survey (SDSS-I/II/III; York et al. 2000; Eisen- 
stein et al. 2011), have made it possible to determine the 
clustering of galaxy populations at scales out to tens of Mpc 
and beyond with reasonable accuracy. 

The Baryon Oscillations Spectroscopic Survey (BOSS; 
Dawson et al. 2013) Data Release 12 (DR12; Alam et al. 
2015) provides redshift of 1.5 million massive galaxies in 
10,000 deg^ area of the sky and for redshifts in the range 
0.15 and 0.75. BOSS DR12 has an effective volume seven 
times larger than that of the SDSS-I/II project. These data 
provide us with a sufficiently statistical sample to examine 
our theoretical predictions over a range of scales. 

In order to compare the ACDM model and the obser¬ 
vational data, it is necessary to link the galaxy and the 
dark matter distributions. There are a number of methods 
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to assign galaxies to the dark matter. State-of-the-art hy- 
drodynamical simulations, that include detailed galaxy for¬ 
mation descriptions, are computationally unaffordable for 
the volumes considered here (e.g., Vogelsberger et al. 2014; 
Schaye et al. 2015), and indeed, there are no large samples of 
simulated galaxies that can be used to match BOSS. Semi- 
analytic models (SAMs) are less computationally consuming 
methods to populate dark matter halos with galaxies (e.g., 
Knebe et al. 2015). These models incorporate some physics 
of galaxy formation. 

The most popular models are based on the statistical 
relations between galaxies and dark matter halos. One of 
the most used models is the halo occupation distribution 
(HOD; e.g., Jing et al. 1998; Peacock & Smith 2000; Berlind 
& Weinberg 2002; Zheng et al. 2005; Leauthaud et al. 2012; 
Guo et al. 2014). The main component of the HOD is the 
probability, P{N\Mh), that a halo of virial mass Mh hosts N 
galaxies with some specified properties. These models have 
several parameters which allow one to match the observed 
clustering. 

The model known as the Halo Abundance Matching 
(HAM; Kravtsov et al. 2004; Conroy et al. 2006; Behroozi 
et al. 2010; Guo et al. 2010; Trujillo-Gomez et al. 2011; Nuza 
et al. 2013; Reddick et al. 2013) connects observed galaxies 
to simulated dark matter halos and subhalos by requiring a 
correspondence between the luminosity or stellar mass and a 
halo property. The assumption of this model is that more lu¬ 
minous (massive) galaxies are hosted by more massive halos. 
However, this relation is not a one to one relation because 
there is a physically motivated scatter between galaxies and 
dark matter halos (e.g., Shu et al. 2012). By construction, 
the method reproduces the observed luminosity function, LF 
(or stellar mass function, SMF). HAM relates the luminos¬ 
ity function (stellar mass function) of an observed sample 
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with the distribution of halos in a A^-body simulation. The 
implemented assignment requires that one works with com¬ 
plete samples in luminosity (stellar mass) or have a precise 
knowledge of the incompleteness as a function of the lumi¬ 
nosity (stellar mass) of the galaxy sample. Luminous red 
galaxies (LRG) are the most massive galaxies in the uni¬ 
verse and they represent the high-mass end of the stellar 
mass function. This feature makes this population of galax¬ 
ies an excellent group to be reproduced with the abundance 
matching. 

In this paper, we compare the clustering of the BOSS 
CMASS DR12 sample with predictions from A^-body sim¬ 
ulations. We use an abundance matching to populate the 
dark matter halos of the BigMultiDark Planck simulation 
(BigMDPL; Klypin et al. 2016). In order to include system¬ 
atic effects from the survey, as well as the proper evolution 
of the clustering, we construct light-cones which reproduce 
the angular selection function, the radial selection function 
and the clustering of the monopole in configuration space. 
To generate these catalogues we developed the SUrvey Gen- 
erAtoR code (SUGAR). Once the HAM and the light-cone 
are applied, we compute the predictions of our model for 
2-point statistics and the Three-point correlation function. 
We also present the prediction of the stellar to halo mass 
relation and its intrinsic scatter compared to lensing mea¬ 
surements. The HAM, the BigMDPL and the methodology 
to produce light-cone played a key role in the construction of 
the MultiDark patchy BOSS DR12 mocks (md-patchy 
mocks Kitaura et al. 2016, companion paper). 

In order to have a good estimation of the uncertainties 
in this work, we use 100 md-patchy mocks. These mocks 
are produced using five boxes at different redshifts that are 
created with the PATCHY-code (Kitaura et al. 2014). The 
PATCHY-code can be decomposed into two parts: 1) comput¬ 
ing approximate dark matter density field, and 2) populat¬ 
ing galaxies from dark matter density field with the bias¬ 
ing model. The dark matter density field is estimated using 
Augmented Lagrangian Perturbation Theory (ALPT; Ki¬ 
taura & Hefi 2013) which combines the second order per¬ 
turbation theory (2LPT; see e.g., Buchert 1994; Bouchet 
et al. 1995; Gatelan 1995) and spherical collapse approxima¬ 
tion (see Bernardeau 1994; Mohayaee et al. 2006; Neyrinck 
2013). The biasing model includes deterministic bias and 
stochastic bias (for details see Kitaura et al. 2014). The ve¬ 
locity field is constructed based on the displacement field 
of dark matter particles. The modelling of finger-of-god has 
also been taken into account statistically. The md-patchy 
mocks are constructed based on the BigMD simulation with 
the same cosmology used in this work. The mocks match the 
clustering of the galaxy catalogues for each redshift bin (see 
Kitaura et al. 2016, companion paper, for details). The Big¬ 
MultiDark light-cone catalogues of BOSS CMASS galaxies 
in the Final DR12 (hereafter BigMD-BOSS light-cone) 
presented in this work are publicly available. 

This paper is structured as follows: sections 2 and 3 de¬ 
scribe the SDSS-HI/BOSS CMASS galaxy sample and the 
BigMDPL A^-body cosmological simulations used in this 
work. In section 4, we provide details on different observa¬ 
tional effects and briefly describe the SUGAR code. Section 
4.1 presents the main ingredients of the HAM modelling of 
the CMASS galaxy clustering. A comparison of our results 
to observation is shown in section 5. Subsequently, we dis¬ 


cuss the principal results in section 6. Finally, in section 
7, we present a summary of our work. For all results in 
this work, we use the cosmological parameters = 0.307, 
Qb = 0.048, Ha = 0.693. 


2 SDSS-III/BOSS CMASS SAMPLE 

The Baryon Oscillations Spectroscopic Survey^ (BOSS; 
Dawson et al. 2013; Bolton et al. 2012) is part of the 
SDSS-HI program (Eisenstein et al. 2011). The project used 
the 2.5 m aperture Sloan Foundation Telescope at Apache 
Point Observatory (Gunn et al. 2006). The telescope used a 
drift-scanning mosaic CCD camera (Gunn et al. 1998) with 
five colour-bands, u,g,r,i,z (Fukugita et al. 1996). Spectra 
are obtained using the double-armed BOSS spectrographs, 
which are significantly upgraded from those used by SDSS 
I/H, covering the wavelength range 3600 — lOOOOA with a re¬ 
solving power of 1500 to 2600 (Smee et al. 2013). BOSS pro¬ 
vides redshift for 1.5 million galaxies in 10,000 deg^ divided 
into two samples: LOWZ and CMASS. The LOWZ galax¬ 
ies are selected to be the brightest and reddest of the low- 
redshift galaxy population {z < 0.4), extending the SDSS 
I/H LRGs. The CMASS target selection is designed to iso¬ 
late galaxies at higher redshift {z > 0.4), most of them being 
also luminous red galaxies. 

In the present paper, we focus on the CMASS DR12 
North Galactic Cap (NGC) sample. Galaxies are selected 
from SDSS DR8 imaging (Aihara et al. 2011) according to 
a series of colour cuts designed to obtain a sample with ap¬ 
proximately “constant stellar mass” (Reid et al. 2016). The 
following photometric cuts are applied: 


17.5 < iamod < 19.9 (1) 

'^mod imod ^ 2 ( 2 ) 

d_L > 0.55 (3) 

ifib2 < 21.5 (4) 

icmod < 19.86 -I- 1.6(d_L - 0.8) (5) 


where i and r indicate magnitudes, ifib2 is the i-band magni¬ 
tude within a 2" aperture. All magnitudes are corrected for 
Galactic extinction (via the Schlegel et al. 1998, dust maps). 
The subscript mod denotes the “model” magnitudes and the 
subscript cmod refer to the “cmodel” magnitudes. The model 
magnitudes represent the best fit of the DeVaucouleurs and 
exponential profile in the r-band (Stoughton et al. 2002) 
and the cmodel magnitudes denote the best-fitting linear 
combination of the exponential and DeVaucouleurs models 
(Abazajian et al. 2004). dj_ is defined as 

d_L — Tmod fmod {Qmod T'mod') (6) 

Star-galaxy separation is performed on the GMASS tar¬ 
gets via: 

ipsf — imod > 0.2 -I- 0.2(20.0 — imod ) (7) 

^psf ^mod ^ 9.125 0.4:(2Zmod^ (8) 

The subscript “psf” refers to Point Spread Function magni¬ 
tudes. CMASS sample contains galaxies with redshift 2 > 
0.4, having the peak of the number density at 2 « 0.5. We 

^ http://http://skyserver.sdss.org/drl2/en/home.aspx 
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will concentrate our analysis in the redshift range 0.43 < 
2 : < 0.7 for this sample. 

BOSS sample is corrected for redshift failures and fiber 
collisions. In the following sections, we will use the same 
weights given in Anderson et al. (2014) in order to cor¬ 
rect the clustering signal affected by these systematics (Ross 
et al. 2012). The total weight for a galaxy is given by: 

Wg — Wstar'^seei^'frzf “t” Wcg 1). (9) 

In equation (9), Wzf denotes the redshift failure weight and 
Wcp represents the close pair weight. Both quantities start 
with unit weight. If a galaxy has a nearest neighbour (of the 
same target class) with a redshift failure (wzf) or its redshift 
was not obtained because it was in a close pair (wcp), we in¬ 
crease Wzf or Wcp by one. As found in Ross et al. (2012), 
the impact of this effect is very small for the CMASS sam¬ 
ple, for this reason, we do not model the redshift failures in 
this study. For CMASS, additional weights are applied to 
account for the observed systematic relationships between 
the number density of observed galaxies and stellar density 
and seeing (weights Wstar and Wsee, respectively). 


3 BIGMULTIDARK SIMULATION 

The BigMDPL is one of the MultiDark^ A-body simula¬ 
tion described in Klypin et al. (2016). The BigMDPL was 
performed with GADGET-2 code (Springel 2005). This sim¬ 
ulation was created in a box of 2.5 h~^ Gpc on a side, with 
3840® dark matter particles. The mass resolution is 2.4x 10®° 
h~^ Mq. The initial conditions, based on initial Gaussian 
fluctuations, are generated with Zeldovich approximation at 
Zinit = 100. The suite of BigMultiDark is constituted of 
four simulations with different sets of cosmological parame¬ 
ters. In this study, we adopt a flat AGDM model with the 
Planck cosmological parameters: Qm = 0.307, Qb ~ 0.048, 
Qa = 0.693, (78 = 0.829, Ua = 0.96 and a dimensionless 
Hubble parameter h = 0.678 (Klypin et al. 2016). The sim¬ 
ulation provides twenty redshift outputs (snapshots) within 
the redshift range 0.43 < 2 < 0.7. 

For the present analysis, we use the RockStar (Ro¬ 
bust Overdensity Galculation using K-Space Topologically 
Adaptive Refinement) halo hnder (Behroozi et al. 2013a). 
Spherical dark matter halos and subhalos are identihed us¬ 
ing an approach based on adaptive hierarchical refinement of 
friends-of-friends groups in six phase-space dimensions and 
one-time dimension. RockStar computes halo mass using 
spherical overdensities of a virial structure. Before calculat¬ 
ing halo masses and circular velocities, the halo finder per¬ 
forms a procedure which removes unbound particles from 
the final mass of the halo. RockStar creates particle-based 
merger trees. The merger trees algorithm (Behroozi et al. 
2013b) was used to estimate the peak circular velocity over 
the history of the halo, Vpcak, which we use to perform the 
abundance matching. 


^ http://www.multidark.org/ 


4 METHODOLOGY: THE SURVEY 

GENERATOR CODE 

We construct light-cone catalogues from the BigMDPL 
simulation which reproduce the clustering measured in the 
monopole of the Redshift-space correlation function from the 
BOSS CMASS DR12 sample. For this purpose, we developed 
the survey GenerAtoR code (SUGAR) which implements 
the HAM technique to generate galaxy catalogues from a 
dark matter simulation. The code can apply the geometric 
features of the survey and selection effects, including stellar 
mass incompleteness and fiber collision effects. All the avail¬ 
able outputs (snapshots) of the BigMDPL simulation are 
used, so that the light-cone has the proper evolution of the 
clustering. 

In the following subsections, we present the ingredi¬ 
ents used to produce the BigMD-BOSS light-cone, which 
is showed in Figure 1 and Figure 2. We present the HAM 
method and the Stellar Mass Function (SMF) adopted in 
this work. The light-cone production, the fiber collision as¬ 
signment and the modelling of the stellar mass incomplete¬ 
ness are also shown. 


4.1 Halo Abundance Matching procedure 

We use a HAM technique to populate dark matter halos 
with galaxies (see e.g., Nuza et al. 2013). This physically 
motivated method produces mock galaxy catalogs that in 
the past gave good representations of large galaxy samples 
(see for SDSS, e.g., Trujillo-Gomez et al. 2011; Reddick et al. 
2013). The basic assumption of this method is that massive 
halos host massive galaxies. This allows one to generate a 
rank-ordered relation between dark matter halos and galax¬ 
ies. However, observations show that this assignment cannot 
be a one-to-one relation (Shu et al. 2012). In order to create 
a more realistic approach, it is necessary to include scatter 
in this matching. The HAM can relate galaxy luminosities or 
stellar mass from galaxies to a halo property. In this paper, 
we use the peak value of the circular velocity over the history 
of the halo (Vpcak), which has advantages compared to the 
halo mass (Mhaio)- Mhaio is well-defined for host halos, but 
its definition becomes ambiguous for subhalos. The subhalo 
mass also depends on the halo finder used (Trujillo-Gomez 
et al. 2011; Reddick et al. 2013). In addition to Mhaio and 
Vpcak, HAM can be performed using other quantities such 
as the maximum circular velocity of the halo (Vmax), the 
maximum circular velocity of the halo at time of accretion 
(Vacc) or the halo mass at time of accretion (Macc)- Other 
studies present the effect of the halo property in the HAM 
(e.g., Reddick et al. 2013; Guo et al. 2016) 

We adopt a modified version of the scatter proposed in 
Nuza et al. (2013). Our implementation of the abundance 
matching can be briefly summarised in the following steps: 

(i) For the dark matter halos, we define a scattered Vpcak, 
which is used only to assign stellar mass to the halos. This 
scattered quantity is defined by: 

U/eTfc = (1 + A/'(0, a^^M))Vpcak, (10) 

where A/” is a random number, produced from a Gaus¬ 
sian distribution with mean 0 and standard deviation 

CIham ( V)>eafe I Af* ) . 
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Figure 1. Left panel: Sky area covered by the BigMD-BOSS light-cone. This region includes the BOSS CMASS DR12 geometry and 
veto masks. Right panel: Sky area covered by the BOSS CMASS DR12 sample. Colours indicate the angular number density, which is 
normalised by the most dense pixel. Each pixel has an angular area of 1 deg^. BigMD-BOSS light-cone uses the same mask as the BOSS 
CMASS DR12, including angular completeness and veto masks. 
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Figure 2. Pie plot of the BigMD-BOSS light-cone (left panel) and the BOSS CMASS DR12 data (right panel). Both figures were made 
with 2 deg of thickness (DEC coordinate). 


(ii) Sort the catalogue by Vpfak^ starting from the object 
with the largest velocity and continuing down until reach¬ 
ing all the available objects. Use this catalogue to construct 
the cumulative number density of the halos as a function of 

■trscat 
^ peak’ 

(iii) Compute the cumulative number density of galaxies 
as a function of the stellar mass using the adopted SMF (see 
4.2). 

(iv) Finally, construct a monotonic relation between the 
cumulative number density functions from step (ii) and (iii) 
such as 

n,ai{> Ml) = nhaio{> (11) 

This relation implies that a halo with "'iH contain a 

galaxy with stellar mass Ml. 

This assignment is monotonic between Vpliak a-nd M*, but 
not between Vp^ak and M*. The relation of these two quan¬ 
tities is mediated by the scatter parameter, aHAM{Vpeak\Mt)- 


4.2 Stellar Mass Function 

We employ the Portsmouth SED-fit DR12 stellar mass cat¬ 
alogue (Maraston et al. 2013) with the Kroupa initial mass 
function (Kroupa 2001) to estimate the SMF. The CMASS 
large-scale structure (LSS) catalogue does not include the 
stellar mass information. For that reason, we matched the 
BOSS and the LEGACY stellar mass catalogues with the 
LSS BOSS CMASS catalogue. In order to identify a SDSS 
spectrum in the different catalogues, there are three num¬ 
bers that determine each galaxy: plate, mjd and fiberid. 
We use these three quantities to match the stellar mass cat¬ 
alogues (LEGACY and BOSS) and the LSS BOSS CMASS 
catalogue. Once the stellar masses of the observed sample 
are assigned, we need to construct a SMF which describes 
the mass distribution. 

The Portsmouth DR12 catalogue has the SMF that is 
different from SMF of previous surveys (Maraston et al. 
2013). Figure 3 shows the mass distribution of the CMASS 
DR12 for two different redshift regions. A detailed study of 
the Portsmouth catalogues and other stellar mass catalogues 
was reported by Maraston et al. (2013). 

Due to the selection function in the BOSS data, we do 
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Mass range 

4>* 

a 

logic 

[Mq] 

[Mpc3 logic 


[Mq] 

logigM* < 11.00 

4.002x10-3 

-0.938 

10.76 

logiQ Mt > 11.00 

2.663x10-3 

-2.447 

11.42 


Table 1. Parameters of the double Press-Schechter SMF for this 
work. 



logio(M*/M0) 


Figure 3. Stellar mass function from BOSS CMASS DR12 sam¬ 
ple. Circles and squares show the stellar mass distribution for two 
redshift bins from the Portsmouth DR12 catalogue. Poissonian er¬ 
rors are included. The solid line shows the estimate of the SMF 
for this work, which is constructed combining the high-mass end 
of the BOSS sample and Guo et al. (2010) for the low-mass range 
(logj^Q M* < 11.0). In order to compare with a complete sample 
in the redshift range 0.5 to 0.65, we include the PRIMUS SMF 
(Triangles) in the low-mass regime. 


not have the information on the shape of the stellar mass 
function at low masses. There are different ways of han¬ 
dling this problem. For example, Leauthaud et al. (2016) 
use the stripe 82 massive galaxy catalogue to compute the 
SMF of the BOSS data. We use a different approach, for the 
high-mass end we use the Portsmouth stellar masses and 
we combine them with Guo et al. (2010) results to describe 
the low-mass regime. Specifically, to compute the SMF for 
masses larger than 3.2 x lO^^M© (which is the mass range 
used in the CMASS sample). 

In order to construct the SMF, we select galaxies in the 
redshift range 0.55 < 2 < 0.65, because this is the most 
complete range for the CMASS sample (see Montero-Dorta 
et al. 2014). We combine the CMASS sample for masses 
larger than 2.5x IO^^Mq and the SMF from Guo et al. (2010) 
for low masses. We fit both results using a double Press- 
Schechter mass function (Press & Schechter 1974) with the 
parameters given in Table 1. 

Figure 3 presents the SMF used in this work. We also 
add in Figure 3 the PRIMUS SMF (Moustakas et al. 2013) 
in the redshift range 0.5 < 2 < 0.65 with the purpose of 
comparing the low-mass range of our SMF with a complete 
sample in the same redshift and mass ranges. A detailed 
comparison of the Portsmouth catalogues and other stellar 
mass catalogues is presented in Maraston et al. (2013). 

In our analysis we do not include redshift evolution of 


the stellar mass function. This approximation agrees with re¬ 
sults of the PRIMUS survey (Moustakas et al. 2013), which 
is a complete survey in the redshift range we study. Mous¬ 
takas et al. (2013), show that there is only a small evolution 
of the stellar mass function in the CMASS redshift range. 


4.3 Production of Light-cones 

We implement a method to generate light-cones from snap¬ 
shots of cosmological simulations. This method has been im¬ 
plemented previously (see e.g., Blaizot et al. 2005; Kitzbich- 
ler & White 2007). The SUGAR code works with cubic boxes 
using positions and velocities of dark matter halos as in¬ 
puts. We will now describe the procedure which we use to 
construct mocks for the CMASS sample. 

BigMD-BOSS light-cones are constructed from the 
BigMDPL simulation which is large enough (2.5 h~^ Gpc) 
to map the CMASS NGC. We use the periodic boundary 
conditions to maximise the use of the volume (Manera et al. 
2013) but we do not reuse any region of the box. So there 
are no duplicated structures in our light-cone. 

The first step in the construction of the light-cone is to 
locate the observer {z = 0) and transform from comoving 
cartesian coordinates to equatorial coordinates (RA,DEC) 
and redshift. To include the effects of galaxy peculiar veloc¬ 
ities in the redshift measurements, we transform the coordi¬ 
nates of the halos to Redshift-space using: 


S = Tc -I- 


V ■ r 

CI>H i^Zreal ) 


( 12 ) 


where is the comoving distance in real space, v is the 
velocity of the object with respect to Hubble flow, f is the 
line of sight direction, a is the scale factor and H the Hubble 
constant at ZrcLai, which is the redshift corresponding to r^, 
and is computed from 


Zreai 

rc{Zreal)= f - ^ (13) 

where c is light speed and Ho is the Hubble constant in 
s“^ Mpc“^ km. Using equation (12) and (13) it is possible 
to compute s{zobs), where Zoba is the observed redshift. The 
next step is to select objects from each snapshot to construct 
shells for the light-cone. Thus, an object with redshift Zoba, 
which comes from a snapshot at z = Zi, will be selected if 
{zi + Zi-i)/2 < Zoba < {zi + Zi+i)/2. We repeat this process 
for all objects in snapshots between 2 = 0.43 and 2 = 0.7. 
We fix the number density in each shell following the radial 
selection function of the BOSS CMASS sample. Figure 4 
shows the comparison between the radial selection function 
of the observed data and the one obtained on the BigMD- 
BOSS light-cone. 

Finally, we apply the angular CMASS NGC mask to 
match the area of the observed sample. The angular com¬ 
pleteness is taken into account by downsampling the regions 
where it is smaller than one. As was done in the BOSS 
CMASS catalogue, we select regions in the sky with com¬ 
pleteness weight larger than 0.7. Due to the presence of ran¬ 
dom numbers in the selection process, the observed radial 
selection function can have variations of ~ 4% . Figure 4 
presents the standard deviation from 100 md-patchy Mocks 
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Figure 4. The comoving number density of BOSS CMASS DR12 
NGC (black line) compared to the comoving number density of 
the BigMD-BOSS light-cone (Dashed line). Shaded area comes 
from 100 MD-PATCHY Mocks. 

to examine the effect of different seed in the random gener¬ 
ator. 

Figure 1 shows the angular distribution of the BigMD- 
BOSS light-cone. In order to reproduce the angular distri¬ 
bution, we applied the BOSS CMASS DR12 NGC geometry, 
and, in addition, we applied veto mask to exclude exactly the 
same regions removed in the observed data. Figure 2 presents 
a 2D comparison of the spatial galaxy distribution between 
the BigMD-BOSS light-cone and the BOSS CMASS data. 

4.4 Stellar Mass Incompleteness 

This paper focuses in the production of mocks which can de¬ 
scribe the full CMASS DR12 sample. Instead of extracting a 
sub-sample which has better completeness in terms of stel¬ 
lar mass, we “model” the observed stellar mass incomplete¬ 
ness. This model not only accounts for the incompleteness at 
small masses (presented across the complete redshift range), 
but also incompleteness in the high-mass end, which is im¬ 
portant for 2 < 0.45. Figure 5 compares the results of our 
modelling in the BigMD-BOSS light-cone to the observed 
data for three different redshifts. 

In order to reproduce the observed stellar mass distri¬ 
bution, we construct a continuous function by interpola¬ 
tion. Once the abundance matching is applied and galax¬ 
ies are assigned to dark matter halos, we select galaxies 
by downsampling based on the observed stellar mass dis¬ 
tribution. This process is repeated for 20 different redshifts 
(corresponding to the snapshots of the simulation). Then, 
in order to construct the observed stellar mass distribu¬ 
tion corresponding to snapshot at z = Zi, a. galaxy with 
redshift Zg in the stellar mass catalogue will be selected 
if {zi + Zi-\)I2 < Zg < (zi + Zi+i)l2. This model has an 
important impact on the scatter applied to the abundance 
matching. Since bias is as a function of stellar mass, incom¬ 
pleteness that varies as a function of stellar mass will affect 
the overall bias as well. This effect reduces the amplitude 
of the clustering, which implies that a smaller scatter is re¬ 
quired to reproduce the signal of the observed clustering. 
If we ignore the incompleteness effect, we can still repro¬ 


duce the clustering in the two point correlation function. 
However, this scatter is not the intrinsic one, and the final 
stellar mass distribution will not match the observed sam¬ 
ple. Favole et al. (2015a) show a similar model to reproduce 
the incompleteness of the ELG population from the BOSS 
sample. 

Most galaxies in the CMASS sample are red galaxies. 
However, there is also a fraction of blue galaxies in the data. 
In addition, the blue sample is less complete than the red one 
(Montero-Dorta et al. 2014). The random downsampling of 
galaxies in the BigMD-BOSS light-cone does not distinguish 
between both populations, which can produce potential sys- 
tematics due to the different completeness of both samples. 
In this study, we reproduce the observed stellar mass distri¬ 
bution by downsampling galaxies from a no-evolving SMF. 
However, SMF evolves with redshift, which can produce un¬ 
derestimation of the incompleteness for some ranges of stel¬ 
lar mass and overestimation for other ranges. 

4.5 Fiber Collisions 

A feature of the BOSS fiber-fed spectrograph is that the fi¬ 
nite size of the fiber housing makes impossible to place fibers 
within 62” of each other in the same plate. This causes a 
number of galaxies to not have a fiber assigned and hence, 
there is no measurement of their redshift. We model the ef¬ 
fect of fiber collisions as follows. A total of 5% of the CMASS 
targets could not been observed due to the fiber collisions. 
These objects have an important effect at scales < 10 h~^ 
Mpc. In this paper, we model the fiber collision effect by 
adopting the method described in Guo et al. (2012). 

The first step is to find the maximum number of galaxies 
that could be assigned fibers. This decollided sample (Di) is 
a set of galaxies which are not angularly collided with other 
galaxies in this subsample. The second population [D 2 ) are 
the potentially collided galaxies. Each galaxy in this subsam¬ 
ple is within the fiber collision scale of a galaxy in population 
1. We must determine from the observed sample the frac¬ 
tion of collided galaxies {D'A in the D 2 group (i.e. D 2 /D 2 ) 
for sectors covered by different numbers of tiles. Finally, we 
randomly select the fraction D 2 /D 2 to the D 2 galaxies in 
the mocks to be collided galaxies. 

Figure 6 displays the impact of the fiber collisions in 
the correlation function in Redshift-space. The effect in the 
monopole becomes very important for scales smaller than 1 
h~^ Mpc. However, the quadrupole is more sensitive to this 
effect, with big impact for scales smaller to 10 h~^ Mpc. The 
assignment of fiber collisions has an important impact on 
the fraction of satellites. Before fiber collisions the satellite 
fraction of the light-cone is 11.8%, and after the assignment 
is equal to 10.5%. This effect reduces the central-satellite 
pairs, which have a strong impact on the quadrupole. 

Unlike Guo et al. (2012), we only use nearest neighbour 
weights for both samples. Our goal is to compare the results 
of the abundance matching with data, so that we implement 
the same fiber collision correction to our light-cone as ob¬ 
served data. 

When nearest neighbour weights are applied, a collided 
galaxy will be “moved” from its original coordinates to the 
position of its nearest neighbour. Figure 7 presents the line 
of sight displacement of those collided galaxies from their 
original positions. 
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Figure 5. Incompleteness modelling for three different redshift bins. Shaded area shows the BigMD-BOSS light-cone, dots are the 
measurements from the CMASS Portsmouth catalogue. In both cases Poissonian errors are used. Dashed line represents the SMF 
adopted in this work. We select three bins as an example to show the results of the incompleteness modelling implemented in this work. 
Stellar mass distribution in the BigMD-BOSS light-cone are produced by downsampling galaxies from the SMF adopted. Left panel 
figure shows the incompleteness at low-redshift in the high-mass of the SMF. 




Figure 7. Line of sight displacement of a collided galaxy due to 
the fiber collision. The figure shows the number of counts per bin 
divided by the total number of collided galaxies. Uncertainties 
were computed using Poissonian errors. 


Figure 6. Monopole (top panel) and quadrupole (bottom panel) 
of the Redshift-space correlation function for the BigMD-BOSS 
light-cone before and after applying fiber collisions. Fiber colli¬ 
sions are corrected using nearest neighbour (NN) weights. The 
effects of the fiber collisions are stronger in the quadrupole, with 
important differences for scales s <7 h~^ Mpc. The impact on the 
monopole is smaller. The fiber collision assignment is an approxi¬ 
mative method which can introduce systematic effects. In order to 
avoid these effects, we select the range 2 h~^ Mpc to 30 h~^ Mpc 
to fit the monopole with the scatter parameter, (THAM(V),eafc|Af*)- 


The displacement for the simulation shown in Figure 7 
is computed using the old and new positions of the collided 
galaxies. In CMASS data, the displacement is calculated us¬ 
ing the overlapping tiled regions of the survey where the 
spectroscopic redshifts of both galaxies within the fiber col¬ 
lision angular scale are resolved. Figure 7 demonstrates an 
excellent agreement between our model and the observed 
data, suggesting the combination between the clustering at 
small scales of the simulation and the fiber collision model 


used in the mock have a reasonable agreement with obser¬ 
vations. 


5 MODELLING BOSS CMASS CLUSTERING 

The clustering signal in the abundance matching is deter¬ 
mined by two quantities: the number density and the scat¬ 
ter in the M* — Vpea*; relation. The number density is fixed 
by the radial selection function of the observed sample. In 
order to find a scatter value that reproduces the cluster¬ 
ing of the CMASS sample, we fit the monopole of the cor¬ 
relation function in Redshift-space. The following sections 
present the results of this monopole fitting, and the predic¬ 
tion of our model of the quadrupole in Redshift-space, pro¬ 
jected correlation function, monopole in Fourier space and 
the Three-point correlation function. 

BigMD-BOSS light-cone covers the same volume as 
CMASS sample between redshift z=0.43 and 2=0.7. In or¬ 
der to have a good estimation of the uncertainties in our 
measurements we use 100 md-patchy mocks (Kitaura et al. 
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2016, companion paper). These mocks are produced using 
five boxes at different redshifts that are created with the 
PATCHY-code (Kitaura et al. 2014). This code matches the 
clustering of the galaxy catalogues for each redshift bin. The 
MD-PATCHY mocks are based on the BigMDPL simulation, 
and they are produced with the same cosmology used in 
this work. To compute errors we use the square root of the 
diagonal terms of the covariance matrix defined as: 

1 ^ 
i=l 

where N is the number of mock catalogues and X is the 
statistical quantity measured. 


5.1 Two point clustering: result from model and 
observations 


In order to compute the correlation function for our light- 
cone and the observed data, we use a Tandy & Szalay es¬ 
timator (Tandy & Szalay 1993). The correlation function is 
defined by 


Cr) 


DD - 2DR + RR 
RR 


(15) 


where DD, DR and RR represent the normalised data-data, 
data-random and random-random pair counts, respectively, 
for the distance range [r — Ar/2, r + Ar/2]. 

In this paper we use random catalogues 20 times larger 
than the data catalogues. In order to estimate the projected 
correlation function and the multipoles of the correlation 
function we use the 2D correlation function, ^(rp,7r), where 
s = I i® the perpendicular component to the line 

of sight and n represent the parallel component. The corre¬ 
lation function of the BigMD-BOSS light-cone is computed 
using close pairs weights and FKP weights (Feldman et al. 
1994), 


1 

— ~ , / N n I 

1 -f n{z)PpKP 


(16) 


where n{z) is the number density at redshift 2 and Pfkp = 
20000 h~^ Mpc®. We use the FKP weights to optimally 
weight regions with different number densities. In the case of 
the BOSS CMASS sample, we use the galaxy weights given 
in equation 9 and in addition the FKP weights. The total 
weights for the data used in our analysis are Wtot = WpKpWg. 

Note that Pfkp is chosen to minimise the variance of 
power spectrum measurements. For the correlation func¬ 
tion measurements, one should use the optimal weight from 
Hamilton (1993), 


Wh = 1/(1 -I- n{z)Jw), 


(17) 


where 

= / Cr)dV. (18) 

Jo 

However, since we are fixing Wfkp or Wh to be a constant 
to simplify the computation, we expect that Wb should be 
similar to Wfkp- In any case, the choice of optimal weight will 
not bias the measurements. 


5.1.J Redshift-space correlation function 

Previous works demonstrated the impact of the scatter in 
the clustering signal of a mock generated with the abundance 
matching (e.g., Reddick et al. 2013). In this study, we search 
for a scatter parameter ((THAM(Vpe(ife|M*)) which reproduces 
the monopole of the correlation function and provides the 
prediction for other quantities. The multipoles of the two- 
point correlation function, in Redshift-space, are defined by 


27 - 1-1 C 

= ^ J Crp,m{u)du 


(19) 


where 



( 20 ) 


and P!(/r) is the Tegendre Polynomial. We will present re¬ 
sults for the monopole (7 = 0) and the quadrupole (7 = 2). 

To find the best value, we fit the clustering using the 
monopole in the Redshift-space for the range 2 to 30 h~^ 
Mpc. Top panel in Figure 8 shows the results of the fit¬ 
ting compared to the CMASS DR12 data. Errors in Fig¬ 
ure 8 and in Figure 9 are computed using 100 md-patchy 
mocks (Kitaura et al. 2016, companion paper). The param¬ 
eter that best reproduces the clustering in the monopole is 
o'HAM{Vpeak\M,) — 0.31. This result is in agreement with pre¬ 
vious works on abundance matching (Trujillo-Gomez et al. 
2011; Nuza et al. 2013; Reddick et al. 2013). 

The simulation provides a good agreement with data in 
the monopole for scales smaller than 50 h~^ Mpc. However, 
the bottom panel in Figure 8 shows a disagreement in the 
quadrupole for scales smaller than 0.7 h~^ Mpc, that can 
be due to the method used to assign the fiber collisions in 
the BigMD-BOSS light-cone, for this reason, we do not 
analyse these scales. An additional disagreement is found 
at scales larger than 6 h~^ Mpc, which will be commented 
in the last section of this work. Nuza et al. (2013) use the 
MultiDark simulation with Qm ~ 0.27. Comparing their 
results for the monopole, we obtain a better agreement for 
scales larger than 10 h~^ Mpc, mainly due to the difference 
in cosmologies used in this work. 

Figure 9 shows the prediction of the monopole and 
quadrupole for large scales compared to the observed data. 
Discrepancies for some values between the model and the 
data at scales larger than 60 h~^ Mpc, could not be due 
only to the cosmic variance. Differences at the baryon acous¬ 
tic oscillation (BAO) scales are of the order of 1 sigma errors 
while for large scales differences can be of the order of 2 or 
3 sigmas. In Figure 9, we can see that the BOSS CMASS 
correlation function at large scales is systematically shifted. 
This excess of power in the correlation function monopole 
could be due to the potential photometric calibration sys- 
tematics which only affect very large scales. Huterer et al. 
(2013) make a detailed study about the photometric cal¬ 
ibration errors and their implication in the measurements 
of clustering and demonstrate that calibration uncertainties 
generically lead to large-scale power. 


5.1.2 Projected correlation function 

The projected correlation function is a quantity which is 
insensitive to the impact of the Redshift-space distortion 
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s{h ^Mpc) 

Figure 8. Top panel: Monopole in Redshift-space from CMASS 
DR12 sample (black points). The shaded area represents the 
modelling of the monopole using the BigMD-BOSS light-cone. 
Bottom panel: Quadrupole in Redshift-space from CMASS 
DR12 sample compared with the theoretical prediction from the 
BigMD-BOSS light-cone. Error bars were computed using MD- 
PATCHY mocks. Small panels show the ratio between the model 
and the observed data. Fitting of the monopole is performed be¬ 
tween 2 h~^ Mpc and 30 h~^ Mpc. The observed monopole is 
in good agreement with our model for scales larger that 2 h~^ 
Mpc. However, the quadrupole shows tensions with observations 
for scales < 1 h~^ Mpc and 5 > h~^ Mpc. 

and provides an approximation to the real space correlation 
function (Davis & Peebles 1983). The projected correlation 
function is defined as the integral of the 2D correlation func¬ 
tion, ^(rp, tt), over the line of sight: 

oo 

Wp{rp) = 2 J ^{rp,Ti)d'K. (21) 

0 

In order to compute Wp{rp) from the discrete correlation 
function (equation (15)), we use the estimator: 

Wp(rp) = 2 ^ ^(rp,'!Ti)A'Ki. ( 22 ) 

i 

We adopt a linear binning in the light of sight direction, 
Attj = Att = 5 h~^ Mpc. We selected TVmax = 100 h~^ Mpc. 
Nuza et al. (2013) find convergence of the projected correla¬ 
tion for this scale. Figure 10 shows the results found for the 
BigMD-BOSS light-cone compared to the CMASS data. 
Error bars were computed using 100 md-patchy mocks. 



0 20 40 60 80 100 120 140 160 180 200 


s{h ^Mpc) 

Figure 9. Monopole (top panel) and quadrupole (bottom panel) 
of the Redshift-space correlation function. The shaded areas are 
the model predictions for large scales using a single light-cone. 
Error bars were computed using md-patchy mocks. Differences 
in the quadrupole are the same showed in the Figure 8. The 
monopole has a good agreement up to 100 h~^ Mpc. However, 
large scales present significant difference, but this can be due to 
the cosmic variance and remaining systematics in the data. These 
differences are within 2-a errors. 



rp{h ^Mpc) 


Figure 10. Projected correlation function prediction from the 
BigMD-BOSS light-cone (shaded region) compared to the BOSS 
CMASS sample. The width of the shaded area represents l-cj 
errors, computed using md-patchy mocks. Our model reproduces 
the clustering for all relevant scales. Scales < 0.6 h~^ Mpc are 
dominated by fiber collision effects. 


Figure 10 reveals a discrepancy at scales 3 h~^ Mpc. 
However, results are in agreement at 2-o- level, so we can con¬ 
sider the data consistent with the prediction of our model. 
Scales below 0.5 h~^ Mpc are dominated by fiber collision. 
Due to this effect, the clustering declines rapidly. 
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Figure 11. Monopole of power spectrum from the BigMD-BOSS 
light-cone and the CMASS DR12 sample. Top Panel: The true 
power spectrum for our light-cone compared to the CMASS DR12 
data corrected by fiber collisions using Hahn et al. (in prep.) 
method. Solid curve shows the initial matter power spectra of 
the BigMDPL simulation scaled to match the amplitude of fluc¬ 
tuations at long waves. A remarkable agreement between the 
data and the model is found for scales k < 1 h Mpc“^. Bottom 
Panel: The comparison between simulation and observed data us¬ 
ing nearest neighbour weights {wcp ). In addition to Wcp, observed 
measurements include systematics weights: Wstar, and Wsee- 
The agreement between the data and the model, in both panels, 
shows the good performance of the fiber collisions assignment in 
the light-cone. In bottom subpanels, dashed lines represent an 
accuracy level of 10% 


5.1.3 Fourier space 


structs the clustering of fiber-collided pairs by modelling 
the distribution of the line-of-sight displacements between 
them using pairs with measured redshifts. In addition, the 
method corrects fiber collisions in the shot-noise correction 
term of the power spectrum estimator. In simulated mock 
catalogues, the correction method successfully reproduces 
the true power spectrum with residuals < l%atfc~0.3 
h Mpc“^ and < 10% at fe ~ 0.9 h Mpc“^. Top panel of 
Figure 11 compares the fiber collision and systematics cor¬ 
rected BOSS CMASS power spectrum to the true power 
spectrum of BigMD-BOSS light-cone, showing remarkably 
good agreement between data and model. Figures 8 and 11 
confirm that the standard HAM is accurate in the modelling 
of the clustering not only at large scales, but also in the one 
halo term. 

Monopoles from our model and the BOSS CMASS data 
using fiber collision weights are shown in the bottom panel 
of Figure 11. Both power spectra agree for k smaller than 1 
h Mpc“^. The BigMD-BOSS light-cone and the observed 
data have a remarkably good agreement in the BAO region 
(inset panel Figure 11), which is not seen in the correlation 
function (Figure 9). This difference can be due to remain¬ 
ing systematics that have a bigger impact on the correlation 
function than in the power spectrum. The agreement be¬ 
tween our model and the observed data, for the true power 
spectrum and the nearest neighbour corrected power spec¬ 
trum, demonstrates that the method used to assign fiber col¬ 
lisions in the BigMD-BOSS light-cone is a good approach 
to simulate this effect. 

As we discussed in Section 5.1.1, the disagreement be¬ 
tween the model and the data in the correlation function 
monopole could be due to potential photometric calibration 
systematics. The effect on the power spectrum will be lim¬ 
ited to very small k, so that it has less impact on the BAO 
scales. However, this excess of power does not have impact 
on BAO measurements from correlation functions when we 
marginalise the overall shape (see Chuang et al. 2013, Ross 
et al. in prep.). 


5.2 Three-point correlation function 

We are also interested in comparing the prediction of the 
Three-point correlation function using the HAM on the Big¬ 
MDPL simulation with the observed data. The 3PCF pro¬ 
vides a description of the probability of finding three objects 
in three different volumes. In the same manner as the 2PCF, 
the 3PCF is defined as: 


The power spectra for the BOSS CMASS sample with near¬ 
est angular neighbour upweighted weights and the Big¬ 
MDPL are computed using the Feldman et al. (1994) power 
spectrum estimator modified to account for the systematic 
weights of the galaxies. In BOSS CMASS, each galaxy is 
assigned a systematics weight (equation (9)), which is ac¬ 
counted for in the estimator. For the BigMD-BOSS light- 
cone, we set Wg = Wcp, for the power spectrum using nearest 
neighbour upweighted fiber collisions weights, and Wg = 1 
for the true power spectrum. 

The power spectrum for the BOSS CMASS sample is 
computed using the method described in Hahn et al. (in 
prep.) in order to correct the effects of fiber collisions on 
smaller scales. The fiber collision correction method recon- 


C{ri2,r23,r3i) = ((5(ri)5(r2)5(r3)), (23) 

where (5(r) is the dimensionless overdensity at the position 
r and rij = ri — rj. We use the Szapudi & Szalay estimator 
(Szapudi & Szalay 1998), 

. ODD - 3D DR + 3DRR- RRR ^, 

«=- WTr -■ 

Figure 12 displays our prediction compared with the 
BOSS CMASS data. We see the results for two kinds of 
triangles: ri = r 2 = 10 h~^ Mpc and n = 10 h~^ Mpc, 
r 2 = 20 Mpc, where 0 is the angle between ri and r 2 . 

A good agreement in the shape of the 3PCF is seen 
in Figure 12 between our prediction and the data. Most of 
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Figure 13. Stellar-to-halo mass ratio. The shaded blue area rep¬ 
resents the best fit of the stellar to halo mass relation measured 
using weak leasing in the CFHT Stripe 82 Survey (Shan et al. 
2015). The red area represent previous HAM result from Behroozi 
et al. 2013c. The analysis in Behroozi et al. (2013c) was modified 
using the Planck cosmology parameters and changing the defi¬ 
nition of the halo mass. Black dots are the prediction from the 
HAM - BigMD-BOSS light-cone. Differences between our model 
and Behroozi et al. 2013c. are mainly due to the SMF adopted in 
both works. Scatter between M 200 M, is similar between the 
data and our model. We adopted constant scatter while observed 
data suggests a dependency of the scatter with the stellar mass. 


Figure 12. Top panel-. BOSS CMASS DR12 Three-point corre¬ 
lation function compared with the model prediction of this work. 
Shaded area shows l-tr uncertainties, with limits ri = 10 h~^ 
Mpc and r 2 = 20 h~^ Mpc. Bottom panel. Three-point correla¬ 
tion function for limits ri = r 2 = 10 h~^ Mpc. The BigMD-BOSS 
light-cone can reproduce almost all scales between 2-a errors. 

the points are in agreement within 2 -(t errors for both con¬ 
figurations represented in Figure 12. However, the BigMD- 
BOSS light-cone is underestimating the 3PCF for 0 ~ 0 
and 9 ~ n. Guo et al. (2015a) find similar discrepancies for 
those scales, which can be produced by velocity effects and 
can be corrected including a velocity bias. Therefore, the 
disagreement in the Three-point correlation function and in 
the quadrupole of the correlation function can be caused by 
the same kind of effects. 


5.3 Stellar to halo mass relation 

The Stellar to Halo Mass Ratio (SHMR) is an important 
quantity to evaluate if the simulated light-cone is provid¬ 
ing a realistic halo occupation. In this way, we use results 
from weak lensing, which is one of the most powerful mecha¬ 
nisms to know the observational SHMR. Figure 13 shows the 
SHMR predicted by the BigMD-BOSS light-cone and mea¬ 
surements in the CFHT Stripe 82 Survey (Shan et al. 2015). 
In order to ensure the convergence of the halos in our predic¬ 
tion, we select halos with masses larger than 5.2xlO^^M0. 
This limit is 150 dark matter particles which give conver¬ 
gence for subhalos (Klypin et al. 2015). 

Predictions of the abundance matching are in agree¬ 


ments with the weak lensing data. In Figure 13, shaded blue 
area shows the intrinsic scatter measured. The dependency 
between scatter and stellar mass is clear. It is also shown in 
the abundance matching (e.g., Trujillo-Gomez et al. 2011; 
Reddick et al. 2013). However, our HAM model uses a con¬ 
stant scatter to reproduce the clustering. This approxima¬ 
tion can generate the disagreement in the scatter between 
data and mock. The red area in Figure 13 indicates the re¬ 
sults from Behroozi et al. (2013c). We modify Behroozi et al. 
(2013c) in order to use the same definition of halo mass and 
implement the Planck cosmology in the analysis. The SMF 
assumptions can be one of the origins for the disagreement 
between both predictions. While we use the BOSS DR12 
stellar mass catalogues to estimate the SMF, Behroozi et al. 
(2013c) use the PRIMUS SMF (Moustakas et al. 2013). The 
difference in how the stellar mass catalogues handle pro¬ 
file fitting produce a variation in the high-mass end of both 
SMF. This effect causes important difference at large stellar 
mass between both predictions. 

Shankar et al. (2014) present the stellar to halo mass 
relation assuming different mass functions and compare their 
results with recent models. They find differences between 
Behroozi et al. (2013c) and Maraston et al. (2013) similar 
to the one shown in our Figure 13. Shankar et al. (2014) 
also find that an intrinsic scatter in stellar mass at fixed halo 
mass of 0.15 dex is needed to reproduce the BOSS clustering. 
This result is in agreement with our model, which predicts 
an intrinsic scatter in stellar mass of 0.14 dex at a fixed halo 
mass. 
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Figure 14. Scale-dependent galaxy bias from the model pre¬ 
sented in this work. We measure the bias with respect to the 
correlation function of dark matter in the BigMDPL light-cone 
for the data and the model. There is an excellent agreement be¬ 
tween the CMASS observations and the predictions of the HAM- 
BigMD-BOSS light-cone. 

5.4 Bias prediction 

Using the HAM-BigMD-BOSS light-cone and its corre¬ 
sponding dark matter light-cone we can estimate the real- 
space bias, b{r), solving the equation (Kaiser 1987; Hamilton 
1992) 

={} + Ip+ (25) 

where /3 ~ //6 is the Redshift-space parameter and f{z = 
0.55) = 0.77 (Planck cosmology). 

Figure 14 shows the linear bias, which is in agreement 
with previous papers that reproduced the CMASS cluster¬ 
ing (see Nuza et al. 2013). For the data and the model, we 
use the dark matter correlation fnnction from the BigMD 
simulation. For the scales shown, the scale-dependent bias 
factor is in the range 1.8-2. We nse the BigMD dark mat¬ 
ter light-cone to estimate the relative bias of the CMASS 
sample to this catalogue. 


6 DISCUSSION 

The BigMD-BOSS light-cone is designed to reproduce the 
full BOSS CMASS sample between redshift 0.43 to 0.7, in¬ 
cluding observational effects. In order to recover the infor¬ 
mation at small scales, similar papers (e.g., Gno et al. 2014, 
2015b; Nuza et al. 2013) correct the observed data by fiber 
collision (see Gno et al. 2012, Hahn et al. in prep.). In this 
work, we assign fiber collisions to galaxies in the light-cone 
and we use nearest neighbonr weights in the data and in 
the model. Our model can be usefnl to test methods that 
recover the clustering in the fiber collision region (Gno et al. 
2012) or in the production of mocks for covariance matrices 
(Kitaura et al. 2016, companion paper). The fiber collision 
assignment adopted in this work can reproduce in a good 
way this observational effect (Figure 11). However, this ap¬ 
proach can introduce small systematics that we don’t include 
in our modelling. 

White et al. (2011) model the full CMASS clustering. 
They find a good fit of the HOD parameters to reproduce 
the observed data. However, they cannot describe the small 
scales because they only include close pair weights in the 


data measurements, which cannot recover the small scale 
clustering (Guo et al. 2012). Nuza et al. (2013) also repro¬ 
duce with a good agreement the GMASS data using a stan¬ 
dard HAM model, they correct by fiber collision using the 
method explained in Guo et al. (2012). Our paper continues 
the work presented in Nuza et al. (2013), including light-cone 
effects, redshift evolution, radial selection function, etc. All 
these papers can reproduce the clustering of the full GMASS 
sample. 

Recent papers show tensions between models and ob¬ 
served data when a most careful selection is done. Guo et al. 
(2015b) study a volume-limited luminous red galaxy sample 
in the redshift range of 0.48 < z < 0.55 of the GMASS 
sample. They need a galaxy velocity bias to describe the 
clustering of the most massive galaxies (~ 10^^ — 10^"^ h~^ 
Mq) using HOD. Saito et al. (2015) show an extension of 
the HAM to describe the colour dependency of the clustering 
for the CMASS sample. Guo et al. (2016) present a compar¬ 
ison between HOD and HAM models, they also modify the 
standard HAM model in order to reproduce clustering at dif¬ 
ferent luminosity cuts. Favole et al. (2015b) present a study 
of the blue population properties compared to the red galax¬ 
ies. They present a modified HOD which allows to include 
both samples in the same mock catalogue. The clustering 
dependency on stellar mass (luminosity) is not implemented 
in our model and we do not distinguish between blue and red 
galaxies. Our implementation of the HAM and stellar mass 
incompleteness is capable of reproducing the full GMASS 
sample, including a big amount of data in our analysis. Zu 
& Mandelbaum (2015) present a modified HOD in order to 
include the stellar mass incompleteness (iHOD). This model 
combines galaxy cluster and galaxy-galaxy lensing and al¬ 
lows to increase ~ 80% the number of modelled galaxies 
than the traditional HOD models. 

We find the largest discrepancy between our model and 
the data in the quadrupole measurements (Figure 8). For 
scales larger than 10 h~^ Mpc, this difference is within the 
3-a errors. The disagreement for s < 1 h~^ Mpc is larger 
than 20%. However, this can be due to the uncertainties 
introduced by the fiber collisions at those scales and effects 
of the resolution of the simulation. Therefore, we will focus 
our attention at scales larger than 5 h~^ Mpc where the 
impact of fiber collision is smaller. 

In order to study the clustering in different redshift 
bins using the HAM implemented in this work, we divide 
the full range into three bins. We select approximately the 
same number of galaxies in each redshift bin in order to 
have similar statistics in all of them. We perform an abun¬ 
dance matching (different scatter values that vary from 0.05 
to 0.5) for each range to fit the monopole. Figure 15 shows 
the monopole and quadrupole for the three different redshift 
bins. The discrepancy in the quadrupole can be due to one 
or more of the approach used in this work. Possible causes 
of this discrepancy are enumerated below. 

(i) Guo et al. (2015b) find similar discrepancies in the 
quadrupole in configuration space for scales > 5 h~^ Mpc. 
They argue that the underestimation of the quadrupole on 
large scales is possible due to the correlated neighbouring 
bins in the covariance matrix. They obtain a reasonable 
even with this feature of the predicted quadrupole. 

(ii) Montero-Dorta et al. (2014) show that the interme- 
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Figure 15. Monopole and quadrupole of the Redshift-space correlation function of the CMASS DR12 sample compared to the HAM- 
BigMD-BOSS light-cone for three redshift bins. The monopole is fitted for all redshift ranges. The middle bin is the most complete range 
in the CMASS sample, and also the best reproduced quadrupole.We perform a HAM with three different scatter parameters to fit each 
of the redshift bins. Differences at low and high redshift can be due to target selection effects we do not include in this study. Another 
source of discrepancy can be the relation between the scatter and the more massive galaxies (Saito et al. 2015). 



Figure 16. Correlation function for the CMASS sample in three 
redshift bins. Top Panel: Monopole with small variations in time. 
Bottom Panel: The quadrupole for the selected ranges. In con¬ 
trast with the monopole, the quadrupole shows larger variations 
for the different redshifts. 

diate redshift bin (0.51 < 2 < 0.57) is the most complete 
region in the CMASS sample. The standard HAM can re¬ 
produce monopole and quadrupole for this redshift bin (see 
Figure 15), but cannot reproduce the quadrupole for the 
other two bins. The CMASS DR12 sample has small vari¬ 
ations in the monopole. However, quadrupole changes and 
it becomes similar for the two redshift ranges where the in¬ 
completeness of the sample is larger (Figure 16). 

(iii) The values of scatter used to fit the monopole of the 


correlation function in the different redshift bins vary in a 
wide range. This can be due to the evolution of the num¬ 
ber density in the CMASS sample and some approxima¬ 
tions used in this work. Leauthaud et al. (2016) show a non- 
negligible evolution of the SMF at low redshift compared 
with the complete redshift range (0.43-0.7). Our approxi¬ 
mation of non-evolving SMF could overestimate the incom¬ 
pleteness in the low redshift range (Figure 5, left panel), then 
the necessary scatter to reproduce the observed correlation 
function will be smaller. We also assume a constant mean 
scatter, but indeed scatter depends on the stellar mass, it in¬ 
creases with the mass of the galaxies (Trujillo-Gomez et al. 
2011; Reddick et al. 2013). This dependency can explain 
why the scatter needed to reproduce the clustering of the 
low redshift range is smaller that the one used in the mid¬ 
dle redshift. At low redshift the number density is equal to 
3.466 X 10“'* ri Mpc“®, which is smaller than 3.942 x 10“'* 
ri Mpc“^ for the middle redshift. If both sample were com¬ 
plete, we will expect a larger scatter in the first range. How¬ 
ever, due to the large incompleteness in the high mass end at 
low redshift, the mean mass of this sample is 1.86 x 10^^ Mq 
compared to 2.04 x 10^^ Mq for the second redshift range. 
For this reason, the scatter needed to reproduce the cluster¬ 
ing is smaller in the low redshift range. In the high redshift 
bin, we can only see very massive galaxies (see Figure 5, 
right panel) compared to the whole population of galaxies 
in the CMASS sample. This range is complete in the high 
mass end, and compared to the other two redshift ranges, 
has a number density very small (1.534 x 10“^ ri Mpc“^), 
which implies larger mean mass (2.63 x 10^^ Mq) and scatter 
than for the other samples. 

(iv) We have added a simple model for the stellar mass 
incompleteness in the CMASS sample. However, there can 
be other effects of the incompleteness in the target selection 
that cannot be modelled in this simple way. Although the 
selection is performed to select LRG, an incomplete blue 
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cloud is in the sample and its fraction compared to the 
red sequence evolves with redshift (e.g., Guo et al. 2013; 
Montero-Dorta et al. 2014). Those two populations can live 
in different kinds of halos, and therefore they should be de¬ 
scribed by different scatter values. The errors introduced by 
this effect can increase with redshift, because the fraction of 
blue galaxies increases as well. As opposed to the low red¬ 
shift bin, the high redshift bin is complete in the high-mass 
end (Figure 5, 2 = 0.65), but the fraction of blue galaxies is 
larger than the middle bin, which can affect the prediction 
of the quadrupole. The presence of a small fraction of the 
so-called “green valley” can also introduce small errors in our 
modelling. 

(v) The number density in the high redshift bin (0.57 < 
2 < 0.70) is very small compared to the middle redshift 
range. In this region, the fraction of small galaxies decreases 
and the impact of the most massive objects in the clustering 
becomes stronger. Guo et al. (2016) and Saito et al. (2015) 
need modification of the HAM model when colour cuts are 
applied. In addition, Guo et al. (2015b) show the necessity to 
introduce a velocity bias in the HOD to reproduce the most 
massive galaxies. If the standard HAM does not describe 
the clustering of the most massive galaxies, HAM mocks, 
which model samples as the GMASS in the redshift range 
0.57 < 2 < 0.70, will not reproduce accurately the clustering 
of the observed data. 

(vi) In addition, recent papers reports results for Lumi¬ 
nous Red Galaxies samples where the number of significant 
mis-central galaxies in halos is larger than expected (e.g., 
Hoshino et al. 2015) or the presence of off-centering for cen¬ 
tral galaxies (e.g., Hikage et al. 2013). The implementation 
of these results in the construction of mocks reproducing 
LRG samples could also modify the quadrupole. 


7 SUMMARY 

We investigated the galaxy clustering of the BOSS GMASS 
DR12 sample using light-cones constructed from the BlG- 
MDPL simulation. We perform a HAM to populate the dark 
matter halos with galaxies using the Portsmouth DR12 stel¬ 
lar mass catalogue. In addition, the stellar mass distribution 
is modelled to take into account the incompleteness in stellar 
mass of the GMASS sample. Our study included such fea¬ 
tures as the survey geometry, veto masks and fiber collision. 
The combination of HAM and the BigMDPlanck simula¬ 
tion provides results in a good agreement with the observed 
data. Our results show that the HAM is a method extremely 
useful in the study of the relation between dark matter halos 
and galaxies, and can be very helpful in the production of 
mock catalogues (Kitaura et al. 2016, companion paper). 

Our main results can be summarised as follows. 

(i) We model the observed monopole in configuration 
space using HAM. Assuming a complete sample, the scat¬ 
ter parameter is very large compared to previous studies. 
The modelling of stellar mass incompleteness significantly 
decreases the value of scatter to crHAM(lpeafc|M*) = 0.31. Our 
model reproduces the observed monopole for nearly every 
scale. 

(ii) The prediction of the quadrupole in configuration 
space appears to be in disagreement with the observed data. 
We present possible explanations of this disagreement. In 


future works, we will concentrate in reduce the possible sys- 
tematics, in order to understand better the limits of our 
model. 

(hi) We compute the projected correlation function and 
the Three-point correlation function, hnding a good agree¬ 
ment between the model and the observed data within l-cr 
errors for most of the scales. For scales ~ 0 and ~ tt, the dif¬ 
ferences are of the order of 2-a errors, which can be related 
to the same factors of the disagreement in the quadrupole. 
The monopole in fc-space of the BigMD-BOSS light-cone is 
in a remarkable agreement with the measurement from the 
CMASS sample corrected by hber collisions (~10% of dif¬ 
ference at fc = 0.9). The same agreement is found when we 
use nearest neighbour weights, which shows that the assign¬ 
ment of fiber collision in the light-cone can reproduce the 
observed data. 

(iv) We compare our prediction of the stellar to halo mass 
relation with lensing measurements. The results are in a 
good agreement with the observed data. Our assumption of 
a constant scatter is reflected in the differences with obser¬ 
vations. Lensing measurements suggest the need to include 
the stellar mass dependency in the scatter of the HAM. 

The BigMD-BOSS light-cone is publicly available. It 
can be found in the SDSS SkyServer^. The current ver¬ 
sion includes angular coordinates (ra, dec), redshift in real 
space and redshift space, peculiar velocity in the line of sight, 
M 20 Q, Vpeak and M,. Properties of galaxies such as effective 
radius (Reff), velocity dispersion ((t„) and mass to light ra¬ 
tio (M/L) will be included in future updates. 
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